Photo Credit:
Nino Marcutti/Alamy Stock
In mid-2020, New York City (NYC) became the epicenter of the global COVID-19 pandemic as its residents were forced to shelter in place and economic activity came to a grinding halt. The City Department of Health and Mental Hygiene (DOHMH) tracks and provide data on the number the daily and aggregate number of COVID cases in NYC on its Github repository. However, since DOHMH only provides raw data (in CSV format), it makes it difficult to digest and detect case trends in the city.
This work seeks to extract, transform and analyze and the daily and aggregate cases reported by DOHMH. It uses visuals to depict trends in COVID-19 infections, hospitalizations and deaths across the city. It also examines case trends among boroughs, demographics, and neighborhoods to understand which group is being impacted the most by the pandemic.
The analysis will be updated at the beginning of each week as new data become available to allows for continuous monitoring of COVID-19 trends in NYC.
NYC DOHMH publishes an open source COVID-19 database on its Github
repository. The database, which is updated daily, contains numerous
tables that provides details about COVID cases, testing and
vaccinations. This analyses uses uses three data sets from the
repository, namely data-by-day, data-by-group
and data-by-modzcta. Below are brief descriptions of each
of the data sets.
data-by-day: Provides a daily
summary of all Covid cases, hospitalizations and deaths that happened in
the City as a whole, and by borough.
data-by-group: Provides a breakdown
of total number of cases, hospitalizations and death by different
demograpics, including borough, age, gender, and race.
data-by-modzcta: Gives a breakdown
of aggregate cases by neighborhood and modified zip code. This data can
be used to map COVID cases and deaths by neighborhood when combined with
the MODZCTA shape files (can be downloaded from DOHMH’s Github or NYC
Open Data Portal).
In addition to the three highlighted above, the analysis also extracts and uses shapefile data from the City’s Open Data Portal to map COVID cases in neighborhoods.
Now, let us extract and load the aforementioned data sets (from DOHMH GitHub page and NYC Open Data Portal) and get them ready for the analysis.
## [1] TRUE TRUE TRUE TRUE TRUE
## Reading layer `geo_export_65e02036-af81-4de3-b7b0-5e1f1e2e0a3e' from data source `/Users/aly_will_mac/Desktop/OLD PC/WILL/LEARNING/1. ALL PROJECTS/R-NYC-COVID-Stats/Shape Files/geo_export_65e02036-af81-4de3-b7b0-5e1f1e2e0a3e.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 178 features and 4 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -74.25559 ymin: 40.49612 xmax: -73.70001 ymax: 40.91553
## Geodetic CRS: WGS84(DD)
The tables below show the first few rows of each data set.
In this section, I examine the data sets to identify what needs to cleaned.
The tables below depict the structure and summary of the three COVID data sets.
| skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| character | date_of_interest | 0 | 1 | 10 | 10 | 0 | 1060 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 2527.1245283 | 5078.2973956 | 0 | 617.75 | 1539.5 | 2863.00 | 54999 | ▇▁▁▁▁ |
| numeric | PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 492.0377358 | 635.9121400 | 0 | 90.00 | 377.0 | 671.25 | 5882 | ▇▁▁▁▁ |
| numeric | HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 175.3084906 | 248.1974756 | 1 | 48.00 | 104.0 | 190.25 | 1842 | ▇▁▁▁▁ |
| numeric | DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 35.9528302 | 77.1863059 | 0 | 7.00 | 13.0 | 32.00 | 598 | ▇▁▁▁▁ |
| numeric | PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 5.9783019 | 24.7973182 | 0 | 0.00 | 1.0 | 2.00 | 240 | ▇▁▁▁▁ |
| numeric | CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 2523.2584906 | 4662.9357200 | 0 | 630.25 | 1581.0 | 2865.25 | 39493 | ▇▁▁▁▁ |
| numeric | ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 3013.9584906 | 5216.2333481 | 0 | 778.75 | 2003.0 | 3622.50 | 43950 | ▇▁▁▁▁ |
| numeric | HOSP_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 175.1009434 | 243.8682523 | 0 | 48.00 | 106.0 | 190.00 | 1663 | ▇▁▁▁▁ |
| numeric | DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 35.9094340 | 76.2970520 | 0 | 8.00 | 12.0 | 32.00 | 566 | ▇▁▁▁▁ |
| numeric | ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 41.8867925 | 99.7462144 | 0 | 8.00 | 13.0 | 34.25 | 775 | ▇▁▁▁▁ |
| numeric | BX_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 417.1660377 | 948.3834150 | 0 | 80.75 | 211.5 | 443.50 | 10559 | ▇▁▁▁▁ |
| numeric | BX_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 96.9745283 | 145.9417346 | 0 | 13.00 | 65.0 | 133.00 | 1575 | ▇▁▁▁▁ |
| numeric | BX_HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 37.7707547 | 57.4190837 | 0 | 9.00 | 20.0 | 40.00 | 390 | ▇▁▁▁▁ |
| numeric | BX_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 6.7990566 | 16.1710112 | 0 | 1.00 | 2.0 | 5.00 | 132 | ▇▁▁▁▁ |
| numeric | BX_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 1.1603774 | 5.1219994 | 0 | 0.00 | 0.0 | 0.00 | 46 | ▇▁▁▁▁ |
| numeric | BX_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 416.4349057 | 851.9053169 | 0 | 81.75 | 236.0 | 461.25 | 7479 | ▇▁▁▁▁ |
| numeric | BX_PROBABLE_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 96.6896226 | 134.1724500 | 0 | 14.00 | 73.0 | 138.25 | 1094 | ▇▁▁▁▁ |
| numeric | BX_ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 513.1311321 | 977.6351710 | 0 | 107.00 | 307.0 | 602.25 | 8573 | ▇▁▁▁▁ |
| numeric | BX_HOSPITALIZED_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 37.7132075 | 55.9878310 | 0 | 9.00 | 21.0 | 39.00 | 358 | ▇▁▁▁▁ |
| numeric | BX_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 6.8094340 | 15.8448422 | 0 | 1.00 | 2.0 | 5.00 | 117 | ▇▁▁▁▁ |
| numeric | BX_ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 7.9584906 | 20.6659700 | 0 | 1.00 | 2.0 | 5.00 | 158 | ▇▁▁▁▁ |
| numeric | BK_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 761.7547170 | 1498.3876504 | 0 | 215.00 | 464.0 | 849.50 | 16664 | ▇▁▁▁▁ |
| numeric | BK_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 134.7622642 | 177.6639518 | 0 | 28.00 | 102.0 | 177.25 | 1906 | ▇▁▁▁▁ |
| numeric | BK_HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 53.1433962 | 72.9644525 | 0 | 16.00 | 31.0 | 56.25 | 555 | ▇▁▁▁▁ |
| numeric | BK_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 11.2283019 | 23.8219067 | 0 | 2.00 | 4.0 | 10.00 | 201 | ▇▁▁▁▁ |
| numeric | BK_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 2.0226415 | 8.5817436 | 0 | 0.00 | 0.0 | 1.00 | 92 | ▇▁▁▁▁ |
| numeric | BK_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 760.6339623 | 1382.4148767 | 0 | 226.50 | 481.0 | 858.00 | 11586 | ▇▁▁▁▁ |
| numeric | BK_PROBABLE_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 134.4273585 | 164.9582641 | 0 | 27.75 | 107.0 | 175.25 | 1213 | ▇▁▁▁▁ |
| numeric | BK_ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 895.0792453 | 1536.0343325 | 0 | 259.00 | 593.5 | 1043.50 | 12786 | ▇▁▁▁▁ |
| numeric | BK_HOSPITALIZED_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 53.0792453 | 71.3291016 | 0 | 17.00 | 32.0 | 53.00 | 490 | ▇▁▁▁▁ |
| numeric | BK_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 11.2245283 | 23.3750180 | 0 | 2.00 | 4.0 | 10.00 | 178 | ▇▁▁▁▁ |
| numeric | BK_ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 13.2415094 | 31.3377557 | 0 | 3.00 | 4.0 | 11.00 | 252 | ▇▁▁▁▁ |
| numeric | MN_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 463.4301887 | 920.8203633 | 0 | 105.00 | 287.0 | 495.00 | 9113 | ▇▁▁▁▁ |
| numeric | MN_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 90.9981132 | 114.9317078 | 0 | 18.00 | 70.5 | 125.25 | 972 | ▇▁▁▁▁ |
| numeric | MN_HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 26.5235849 | 36.3131175 | 0 | 7.00 | 16.0 | 31.00 | 275 | ▇▁▁▁▁ |
| numeric | MN_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 4.9179245 | 10.0653136 | 0 | 1.00 | 2.0 | 5.00 | 92 | ▇▁▁▁▁ |
| numeric | MN_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 0.8198113 | 3.2486772 | 0 | 0.00 | 0.0 | 0.00 | 33 | ▇▁▁▁▁ |
| numeric | MN_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 462.7660377 | 839.4616940 | 0 | 119.00 | 316.0 | 488.50 | 6394 | ▇▁▁▁▁ |
| numeric | MN_PROBABLE_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 90.7924528 | 108.4387633 | 0 | 17.00 | 75.0 | 129.00 | 766 | ▇▁▁▁▁ |
| numeric | MN_ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 553.5547170 | 941.3468521 | 0 | 146.50 | 380.0 | 604.00 | 7160 | ▇▁▁▁▁ |
| numeric | MN_HOSPITALIZED_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 26.4820755 | 35.3165459 | 0 | 6.00 | 17.0 | 31.00 | 228 | ▇▁▁▁▁ |
| numeric | MN_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 4.9094340 | 9.7595892 | 0 | 1.00 | 2.0 | 4.00 | 73 | ▇▁▁▁▁ |
| numeric | MN_ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 5.7188679 | 12.7291173 | 0 | 1.00 | 2.0 | 5.00 | 100 | ▇▁▁▁▁ |
| numeric | QN_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 705.7452830 | 1430.4416288 | 0 | 146.00 | 407.0 | 802.25 | 15221 | ▇▁▁▁▁ |
| numeric | QN_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 136.2103774 | 171.6479589 | 0 | 21.00 | 100.5 | 195.00 | 1609 | ▇▁▁▁▁ |
| numeric | QN_HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 49.2679245 | 76.7668675 | 0 | 13.00 | 27.0 | 52.00 | 609 | ▇▁▁▁▁ |
| numeric | QN_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 10.7547170 | 24.3999997 | 0 | 2.00 | 4.0 | 9.00 | 202 | ▇▁▁▁▁ |
| numeric | QN_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 1.7169811 | 7.3487965 | 0 | 0.00 | 0.0 | 1.00 | 68 | ▇▁▁▁▁ |
| numeric | QN_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 704.6490566 | 1322.4917847 | 0 | 149.75 | 436.5 | 828.00 | 11550 | ▇▁▁▁▁ |
| numeric | QN_PROBABLE_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 135.8471698 | 161.0818396 | 0 | 21.00 | 104.0 | 198.25 | 1220 | ▇▁▁▁▁ |
| numeric | QN_ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 840.4971698 | 1469.4601416 | 0 | 184.00 | 545.5 | 1052.75 | 12687 | ▇▁▁▁▁ |
| numeric | QN_HOSPITALIZED_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 49.2264151 | 75.3578411 | 0 | 13.00 | 28.0 | 52.00 | 562 | ▇▁▁▁▁ |
| numeric | QN_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 10.7396226 | 24.0074771 | 0 | 2.00 | 4.0 | 9.00 | 177 | ▇▁▁▁▁ |
| numeric | QN_ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 12.4603774 | 30.9072959 | 0 | 2.00 | 4.0 | 10.00 | 240 | ▇▁▁▁▁ |
| numeric | SI_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 178.1754717 | 332.4948406 | 0 | 42.00 | 111.0 | 197.25 | 3720 | ▇▁▁▁▁ |
| numeric | SI_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 33.0283019 | 36.5226352 | 0 | 5.75 | 26.0 | 49.00 | 316 | ▇▁▁▁▁ |
| numeric | SI_HOSPITALIZED_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 10.7018868 | 11.8029860 | 0 | 3.00 | 7.0 | 14.00 | 83 | ▇▂▁▁▁ |
| numeric | SI_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 2.2509434 | 3.8622711 | 0 | 0.00 | 1.0 | 3.00 | 34 | ▇▁▁▁▁ |
| numeric | SI_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 0.2584906 | 0.9819781 | 0 | 0.00 | 0.0 | 0.00 | 9 | ▇▁▁▁▁ |
| numeric | SI_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 177.9216981 | 305.2646687 | 0 | 42.00 | 118.0 | 199.25 | 2686 | ▇▁▁▁▁ |
| numeric | SI_PROBABLE_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 32.8792453 | 33.9528160 | 0 | 6.00 | 27.0 | 50.00 | 233 | ▇▃▁▁▁ |
| numeric | SI_ALL_CASE_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 210.7924528 | 334.0528820 | 0 | 48.75 | 149.0 | 250.00 | 2906 | ▇▁▁▁▁ |
| numeric | SI_HOSPITALIZED_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 10.6811321 | 11.2766827 | 0 | 3.00 | 8.0 | 13.00 | 72 | ▇▂▁▁▁ |
| numeric | SI_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 2.2301887 | 3.5835942 | 0 | 1.00 | 1.0 | 2.00 | 26 | ▇▁▁▁▁ |
| numeric | SI_ALL_DEATH_COUNT_7DAY_AVG | 0 | 1 | NA | NA | NA | NA | NA | 2.4990566 | 4.3994376 | 0 | 1.00 | 1.0 | 3.00 | 34 | ▇▁▁▁▁ |
| numeric | INCOMPLETE | 0 | 1 | NA | NA | NA | NA | NA | 402.6320755 | 4940.5862792 | 0 | 0.00 | 0.0 | 0.00 | 60970 | ▇▁▁▁▁ |
| skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| character | group | 0 | 1.0000000 | 3 | 9 | 0 | 6 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| character | subgroup | 0 | 1.0000000 | 0 | 22 | 1 | 27 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | CONFIRMED_CASE_RATE | 1 | 0.9629630 | NA | NA | NA | NA | NA | 30503.3977 | 4679.7023 | 19004.50 | 27864.062 | 30666.865 | 33052.327 | 39665.81 | ▁▃▇▅▃ |
| numeric | CASE_RATE | 1 | 0.9629630 | NA | NA | NA | NA | NA | 36444.4442 | 5691.9720 | 22559.28 | 33003.207 | 36720.275 | 39326.058 | 47018.65 | ▁▃▇▅▃ |
| numeric | HOSPITALIZED_RATE | 1 | 0.9629630 | NA | NA | NA | NA | NA | 2437.4304 | 1997.9331 | 241.72 | 1439.553 | 2357.615 | 2585.832 | 10635.32 | ▇▇▁▁▁ |
| numeric | DEATH_RATE | 3 | 0.8888889 | NA | NA | NA | NA | NA | 658.6129 | 824.7574 | 2.70 | 358.265 | 542.565 | 607.050 | 4235.86 | ▇▁▁▁▁ |
| numeric | CONFIRMED_CASE_COUNT | 1 | 0.9629630 | NA | NA | NA | NA | NA | 592645.0000 | 545165.0400 | 99530.00 | 272624.750 | 435353.000 | 702846.250 | 2678752.00 | ▇▃▁▁▁ |
| numeric | PROBABLE_CASE_COUNT | 1 | 0.9629630 | NA | NA | NA | NA | NA | 115452.8462 | 106379.3686 | 18617.00 | 56596.500 | 88915.000 | 141340.750 | 521560.00 | ▇▂▁▁▁ |
| numeric | CASE_COUNT | 1 | 0.9629630 | NA | NA | NA | NA | NA | 708097.8462 | 651324.5863 | 118147.00 | 329874.250 | 526533.000 | 847702.250 | 3200312.00 | ▇▃▁▁▁ |
| numeric | HOSPITALIZED_COUNT | 1 | 0.9629630 | NA | NA | NA | NA | NA | 46099.4231 | 42720.9188 | 1665.00 | 17391.750 | 37600.500 | 58816.500 | 203792.00 | ▇▅▁▁▁ |
| numeric | DEATH_COUNT | 3 | 0.8888889 | NA | NA | NA | NA | NA | 12205.3750 | 10682.4942 | 46.00 | 5244.500 | 9365.500 | 19642.250 | 44585.00 | ▇▂▅▁▁ |
| skim_type | skim_variable | n_missing | complete_rate | character.min | character.max | character.empty | character.n_unique | character.whitespace | numeric.mean | numeric.sd | numeric.p0 | numeric.p25 | numeric.p50 | numeric.p75 | numeric.p100 | numeric.hist |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| character | NEIGHBORHOOD_NAME | 0 | 1 | 7 | 59 | 0 | 162 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| character | BOROUGH_GROUP | 0 | 1 | 5 | 13 | 0 | 5 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| character | label | 0 | 1 | 5 | 12 | 0 | 177 | 0 | NA | NA | NA | NA | NA | NA | NA | NA |
| numeric | MODIFIED_ZCTA | 0 | 1 | NA | NA | NA | NA | NA | 10810.37853 | 5.781733e+02 | 10001.00000 | 10301.00000 | 11109.00000 | 11361.00000 | 11697.00000 | ▇▃▁▇▇ |
| numeric | lat | 0 | 1 | NA | NA | NA | NA | NA | 40.72555 | 8.364830e-02 | 40.50777 | 40.67082 | 40.72644 | 40.77643 | 40.89951 | ▁▅▇▇▃ |
| numeric | lon | 0 | 1 | NA | NA | NA | NA | NA | -73.91881 | 9.965940e-02 | -74.24227 | -73.97870 | -73.92405 | -73.84698 | -73.71091 | ▁▁▇▆▃ |
| numeric | COVID_CONFIRMED_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 14516.59322 | 8.186873e+03 | 1038.00000 | 8408.00000 | 13499.00000 | 20404.00000 | 33988.00000 | ▆▇▆▅▂ |
| numeric | COVID_PROBABLE_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 2861.22034 | 1.539180e+03 | 207.00000 | 1785.00000 | 2616.00000 | 3981.00000 | 7082.00000 | ▅▇▅▃▁ |
| numeric | COVID_CASE_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 17377.81356 | 9.605003e+03 | 1292.00000 | 10259.00000 | 15948.00000 | 24337.00000 | 39937.00000 | ▆▇▆▅▂ |
| numeric | COVID_CONFIRMED_CASE_RATE | 0 | 1 | NA | NA | NA | NA | NA | 30934.32780 | 4.638634e+03 | 19013.04000 | 27863.59000 | 30245.67000 | 33423.36000 | 48296.47000 | ▁▇▆▁▁ |
| numeric | COVID_CASE_RATE | 0 | 1 | NA | NA | NA | NA | NA | 37235.82288 | 5.170696e+03 | 24264.42000 | 33774.31000 | 36345.32000 | 39695.18000 | 58541.63000 | ▁▇▃▁▁ |
| numeric | POP_DENOMINATOR | 0 | 1 | NA | NA | NA | NA | NA | 47100.66136 | 2.615157e+04 | 2972.12000 | 27180.77000 | 42737.28000 | 66856.31000 | 110369.78000 | ▅▇▅▃▂ |
| numeric | COVID_CONFIRMED_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 211.45763 | 1.524139e+02 | 0.00000 | 91.00000 | 170.00000 | 314.00000 | 781.00000 | ▇▅▃▁▁ |
| numeric | COVID_PROBABLE_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 35.23729 | 2.660870e+01 | 0.00000 | 15.00000 | 28.00000 | 48.00000 | 109.00000 | ▇▆▃▂▁ |
| numeric | COVID_DEATH_COUNT | 0 | 1 | NA | NA | NA | NA | NA | 246.69492 | 1.770593e+02 | 1.00000 | 110.00000 | 201.00000 | 364.00000 | 884.00000 | ▇▅▃▂▁ |
| numeric | COVID_CONFIRMED_DEATH_RATE | 0 | 1 | NA | NA | NA | NA | NA | 425.20243 | 1.885513e+02 | 0.00000 | 330.23000 | 422.10000 | 521.66000 | 1305.45000 | ▃▇▃▁▁ |
| numeric | COVID_DEATH_RATE | 0 | 1 | NA | NA | NA | NA | NA | 496.28921 | 2.200567e+02 | 11.42000 | 380.20000 | 489.32000 | 618.73000 | 1528.15000 | ▃▇▃▁▁ |
| numeric | PERCENT_POSITIVE | 0 | 1 | NA | NA | NA | NA | NA | 24.86322 | 4.654500e+00 | 7.89000 | 22.61000 | 25.40000 | 27.18000 | 36.33000 | ▁▁▆▇▁ |
| numeric | TOTAL_COVID_TESTS | 0 | 1 | NA | NA | NA | NA | NA | 53736.57062 | 2.932419e+04 | 4242.00000 | 30822.00000 | 48733.00000 | 75421.00000 | 129062.00000 | ▆▇▆▃▂ |
From the tables above, the only things that needs to be addressed is
missing values in the group data.
From the data summary tables, only the group data has
missing values. Let us check again to make sure.
As depicted by the charts, about four percent of the observations in
the group data are missing. After further review, it is
clear that all the missing observations are from the
Age group category under group column. In
Section 4.1, I re-code the age groups under the
subgroup.
To get the ready for the analysis, I proceed to clean and manipulate them by executing the following actions.
group data: I combined some of the
age categories into one to remove the missing values. I also added
corrected borough name from StateIsland to
Staten Island.
daily data: I changed the data type
of date-of-interest variable from character to
date.
Under the Age group category, the 0-17
group has three sub-groupings (0-4, 5-12, 12-17). However, the
DEATH_RATE & DEATH_COUNT statistics are
only provided for the 0-17 age group. Besides
DEATH_RATE and DEATH_COUNT, the other COVID
statistics are only provided for the age sub-categories and not the main
0-17 category. This creates missing values in the rows
containing the age categories as shown below.
To handle the missing data, I use the rollsumr function
in R to aggregate statistics for the three sub-categories (0-4, 5-12,
12-17) under the main category (0-17). The sub-categories are
subsequently deleted from the table.
Now, check to see if the re-coding took care of the missing values in
the group data.
Staten Island subgroupIn the group table, ‘Staten Island’ is written as
StatenIsland as shown in the table below.
| Borough | Count |
|---|---|
| Bronx | 1 |
| Brooklyn | 1 |
| Manhattan | 1 |
| Queens | 1 |
| StatenIsland | 1 |
I clean it by adding a white space to correct the name of the Borough, as show below.
| Borough | Count |
|---|---|
| Bronx | 1 |
| Brooklyn | 1 |
| Manhattan | 1 |
| Queens | 1 |
| Staten Island | 1 |
date_of_interestIn the daily data, the date_of_interest column is stored
as a string variable. I change it to a date variable.
## [1] "Date"
This section analyzes the daily and total number of COVID cases in the City as a whole as of 2023-01-23.
As of 2023-01-23, approximately 3.2 million COVID infections have been recorded in NYC, with close to 204,000 of those infection leading to hospitalization. Exactly 44,585 people have lost their lives from COVID-19 in the City.
| Total Infections | Total Hospitalizations | Total Deaths |
|---|---|---|
| 3,200,312 | 203,792 | 44,585 |
The charts below show the trends in daily Citywide cases since the beginning of the pandemic.
The charts above show that NYC reached the peak of infection in the beginning of 2022, during the Omicron wave. While there have been three waves in hospitalizations and deaths, most of the hospitalizations and deaths occurred during the initial wave of infections (between March and April of 2020). The availability of vaccines during the Omicron wave appear to have helped reduce the number of hospitalizations and deaths around that time.
The table below shows the number of new infections, hospitalizations and deaths recorded on 2023-01-23 - the latest date we have record for.
| Date | Infections | Hospitalizations | Deaths |
|---|---|---|---|
| 2023-01-23 | 1,759 | 10 | 6 |
This section disaggregates the daily and total number of COVID cases among the five NYC boroughs.
The chart below shows the total number of COVID cases by borough. It gives the raw numbers of infections, hospitalizations and deaths since the beginning of the pandemic. Because we are looking at raw numbers (and not numbers adjusted for population), densely populated boroughs will show more infections, hospitalizations and deaths.
The charts below show the trends in the daily average infections, hospitalizations and deaths per borough.
The charts above show that daily infections, hospitalizations and deaths have consistently been highest in Brooklyn and Queen.
Section 6.1 shows Brooklyn has the highest number of cases, hospitalizations and deaths among all boroughs. This makes sense since Brooklyn is the most populous of the five boroughs. However, to be able to compare boroughs to determine which one has been severely affected, we have to adjust for population. Hence, we use the rates (per 100,000) statistics.
Below are the infection, hospitalization and death rates (per 100,000) for each borough.
The chart above indicates that after adjusting for population, Staten Island - the least populated borough - has the highest rate of infections. The Bronx, on the hand, has the highest rate of hospitalizations and deaths.
This section details how COVID-19 has impacted NYC residents of
different age groups. The data set breaks down age into eight categories
- 0-17, 18-24, 24-34,
35-44, 45-54, 55-64,
65-74, and 75+.
The first tab shows the infection, hospitalization and death rates (per 100,000) for the various age groups. The second tab shows hospitalization and death rates as a share of case rates.
The two tables indicate that, while young people (under 45 years) are infected at higher rates than any other age group, only a small share are hospitalized and they barely any die from the virus. On the other hand, seniors, especially those 75 year and over, tend to be hospitalized and die at the highest rate even though they have the lowest infection rates. This is consistent with reports that COVID is much more deadly among seniors.
This section details how COVID has affected people of different
racial and ethnicity background. The data sets breaks race/ethnicity
into four categories - Asian/Pacific-Islander,
Black/African-American, Hispani/Latino and
White.
The first tab shows the infections, hospitalizations and deaths rates (per 100,000) for each race/ethnicity.
The second tab shows hospitalization and death rates as a share of case rates.
The two charts indicate that, while African-Americans have one of the lowest infections rates, they tend to be hospitalized or die from the virus at the highest rates.
In this section, I use choropleth maps to visualize and compare infection and death rates (per 100K) among NYC neighborhoods.
To create the maps, i merge the modzcta dataframe (which
disaggregates total COVID cases by zip code and neighborhoods) and the
modzcta shapefile.
The following are the trends in observed reported COVID cases in NYC as of 2023-01-23.
Infections peaked in January 2022, during the Omicron wave.
However, hospitalizations and deaths reached their peaks during the first wave of the pandemic (April 2020). Because of the availability of vaccines, the Omicron wave did not cause as much hospitalization and was not as deadly as the 2020 wave of infections.
Because of the size of its population, Brooklyn has seen the highest number of infections, hospitalizations and deaths since the beginning of the pandemic compared to the other boroughs.
Brooklyn and Queens have consistently averaged the highest number of infections, hospitalizations and deaths since the beginning of the pandemic per day.
In terms of age, young people under 45 years have the highest rate of infection. Yet, seniors over 65 years tend to be hospitalized and die at the highest rates.
Even though African-Americans have one of the lowest infection rates, they tend to be hospitalized and die at higher rates compared other races/ethnicities.
https://webbi1.health.ny.gov/SASStoredProcess/guest?_program=/EBI/PHIG/apps/asthma_dashboard/ad_dashboard&p=it&ind_id=ad16↩︎
See https://public.tableau.com/app/profile/w.k.8632/viz/NYCCOVID-19Tracker/DailyConfirmedReportings for an interactive dashboard tracking COVID-19 cases in NYC↩︎